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Abstract 

A key problem in model-based object recognition is selection namely, 
the problem of determining which regions in the image are likely^to 
come from a single object. In this paper we present an approach that 
uses color as a cue to perform selection either based solely on image- 
data (data-driven), or based on the knowledge of the color description 
of the model (model-driven). Specifically, the paper argues for the spec¬ 
ification of color in terms of color categories as being appropriate for 
the task of selection. These color categories are used to develop a fast 
region segmentation algorithm that extracts perceptual color regions 
in images. The color regions extracted form the basis for performing 
data and model-driven selection. Data-driven selection is achieved by 
selecting salient color regions as judged by a color-saliency measure 
that emphasizes attributes that are also important in human color per¬ 
ception. The approach to model-driven selection, on the other hand, 
exploits the color region information in the model to locate instances 
of the model in a given image. The approach presented tolerates some 
of the problems of occlusion, pose and illumination changes that make 
a model instance in an image appear different from its original descrip¬ 
tion. Finally, the utility of color-based data and model-driven selection 
is discussed in the context of reducing the search involved m recogni¬ 
tion. 


Copyright © Massachusetts Institute of Technology, 1992 

This work is supported by an IBM Graduate Fellowship. It describes research done at the 
Artificial Intelligence Laboratory of the Massachusetts Institute of Technology_ Support for th 
,11"^ intelligence tesceh it p.ovided in p„t * the Adv^~ 
Agency of the Department of Defense under Army contract mmber DACA76-85,0-0010, 

Office of Naval Research contract N00014-85-K-0124 and under NSF Grant IR1-8900267. 



1. SELECTION IN RECOGNITION 

A key problem in object recognition is selection, namely, the problem of identi¬ 
fying regions in an image within which to start the recognition lde ^ y ^ 

isolating regions in an image that are likely to come from 

based object recognition methods that try to recognize which members of the 
library of models are present in the scene, nsnally use geometiric features such a 
points or edges and try to identify pairings between data and model features th 
are consistent with a rigid transformation of the object model into .mage coc, d^ 
nates The large number of such pairings that need to be examined m cluttered 
sceZ leads to! combinatorially explosive search problem. It has been shown that 
this search can be considerably reduced if recognition systems are «jmpp«d^with 
a selection stage where subsets of data features can be isolated *>“* ■““'** 
come from a single object, thus allowing the search to be focused on those matches 
that are more likely to lead to a correct solution [12], This isolation-b- 
ther based solely on image data (data-driven) or can incorporate 
of the model (task-driven or model-driven). In addition, it is desirable to order 
these subsets of data features such that the more promising ones, i.e., those that 
me more likely to point to a single object, are explored first. This 
increase the likelihood of a good match being obtained ember, but is also useful 
when the task is to recognize as many objects as possible m a scene. T n 
goals of selection in recognition are two-fold: To isolate areas in e image 
are likely to come from a single object, and to order these regions such that the 
more promising ones are explored first. These goals of selection are different from 
those of segmentation, where the problem is to partition the image into regions 
that contain a single object. In selection, on the other hand, it is not essential to 
isolate regions that totally contain a single object, nor is it necessary to partiti 

the entire image into different object containing regions. 

Even though selection can be of help in recognition, it has largely remained 
unsolved. What makes selection so difficult? In the ideal case, if the appearance 
of the desired object in the scene were known, and objects in the scene were nice y 
separated and distinguishable from the background, and the illumination condi¬ 
tions were known, then even simple methods that rely on intensity measurements 
would work well to extract groups of features. But in reahty, the appearance of the 
object is not known. In addition, illumination conditions and surface geometries 
of objects present in a scene can cause problems of occlusion, shadowing specu- 
larities, and inter-reflections in the image and make it difficult to interpret groups 
of data features such as edges and lines. Previous approaches to selection have 
focused on the problem of data-driven selection by grouping data features such as 
edges, lines, points, or based on constraints such as parallelism, or coffinearity, [19], 
distance and orientation [18], and regions enclosed by a group of edges [ ]. 
extent to which such grouping methods reduce the search m recogm ion ep 
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on the reliability of the groups produced (i.e. how many of them real ' y “'"^ r “jt 
a single object). Maintaining the reliability of groups was found to be d.ffic 
using constraints such as the ones listed above. So the generalproblem. of seleo 
tion remains largely unsolved as it is still not obvious how to rehabl 
subsets of data features that will give clues that point to a sin « J 
it appears that there is a need for a computatmnal model of select.on to explain 

both data and task-driven selection. , . , 

We have been involved in building one such model that proposes that se ec 

tion be accomplished via an attention mechanism. Specifically, it is an attempt*° 
bSd a computationd model of the visual attention phenomenon in humansand 
to propose it as a selection mechanism for recognition. This involves the .sola- 
tion P of P two modes of human attentional behavior, namely attracted-attention 
pay-attention modes, to serve as paradigms for data-driven and model-drive 
lection respectively. The attracted-attention mode of behavior is spontaneous and 
‘rtlmonty exhibited by an unbiased observer (i.e with no « 
when some objects or some aspects of the scene attract his/her attention. T 
pay-attention mode is a more deliberate behavior exhibited by an observer look, g 
at a scene with o priori goals (such as the task of recognizing an object, say) and 
hence paying attention to only those objects/aspects of a scene that are relevant to 
tL go^l. According to this model, therefore, data-driven selection can be achieved 

by identifying regions in an image that attract attention (i.e., a^ ar 

with msp^t to some feature such as color or texture, while model- driven selection 
can be achieved by paying attention to the model features (i.e., using th<> mod 
features to decide saliency of features in the image). While it is understand! 
that paying attention to model features can help isolate areas in the image that 
couldContain subsets of data features that are likely to contain a single object ( 
the specific model object in this case), it is not immediately apparent how locating 
salient regions can help in serving the goals of selection Such a choice is h - 
ever, motivated by the following considerations. First, it is often observed that an 
object stands out in a scene because of some distinctive features that are tuuaUy 
localized to some portion of the object. Therefore isolating distinctive regions is 
more likely to point to a single object. Secondly, a distinctive region, if suitably 
found, can help in limiting the number of candidate models from he hbrerythat 
can potentially match the given data. This is especially true if only a few mo 
in the library satisfy the features that made the data region distinctive lastly . 
has often been observed that the first objects recognized in a scene are those that 
attract an observer’s attention [15], Thus ordering the regions by distinctiveness 
to decide which objects to recognize first seems to be in keeping with this obser¬ 
vation. Finally, a number of other approaches have also suggested that selection, 
at least data-driven, can be performed based on some measure ® f 
as the structural saliency of curves [29), or saliency defined by local differences in 
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contrast, color, or size [8, 24, 28]. , j_i driven 

The above discussion indicates a framework m which data and modei-dnv 

selection can be achieved. But how can salient regions be found ^ ^e^nrnge 
independent of the model, and how can the object model affect the choice of 
regions? The purpose of this paper is to present a method of selection by restric g 
attention to one particular feature, namely, color. It shows ow 
can be extracted from the image and how they can be used to perform data 
driven and model-driven selection. To give a flavor for the ensuing disc ^ ss ^ 
Figures 1-3 show some examples of the results of data and model-driven selectio 
performed by our system. Figure la shows an image of a reahstic 
shadows, inter-reflections, and consisting of many types of objects. The dlffer 
color regions found in this image are re-colored and shownin F,gure hTb 
four most salient color regions found are shown in Figures lc-lf. The “ 
span objects in the scene that are salient in color. Figure 2-3 show”“ d .? e 
selection using color, using the model object shown m Figure 2a *?l‘d d C X 
depicted in Figure 2c. The cluster of regions found to best satisfy the model c 
region description using our algorithm for model-driven selection is shown m Figur 

3d The rest of the paper discusses how this kind of selection can be achieved 
using color. It is organized as follows. In Section 2, we motivate the choice of 
color as a feature to study selection, and outline the requirements imposed* 
selection on any method for the extraction of color information. Based on thes 
guidelines, an approach to extracting color regions is presented in Sectlon J- 
lection 4 a measure for expressing the saliency of color regions is presented and 
fts rlcncis for dafa-drLn relection is examined. Section 5 
to perform model-driven selection based on the color regions. Finally, Section 6 
summarizes our approach to color-based selection. 

2. COLOR IN SELECTION 
2.1 Role of Color in Selection 

Color is known to be a strong cue in attracting an observer s attention, li ¬ 
mans often also use color information to search for specific objects in a scene, 
therefore seems natural to use color as a cue for performing selection in computer 
vision. But the strong motivation for using color in selection comes from the find 
that it provides region information and that, when specified appropriately, 
be relatively insensitive to variations in normal fflumination conditions and appear- 
ances of objects [31]. A color region in the image almost always comes enti y 
horn a single objict, giving, therefore, more reliable groups than — grouping 
methods and this can be useful for data-driven selection. Because objects tend to 
show color constancy nnder most illumination conditions, color can be a stable:e 
for most poses (appearances) of objects in scenes, thus making it also suitable 

model-driven selection. 
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2.2 Surface Color, Image Color, Perceptual Color 

Although color is useful for selection, the problem of specifying the perceived 
color of objects, that is, the color perceived by humans looking at an unzgeoi 
scene has proven to be difficult in computer vision. Several artifacts such a p 
ularities (from shiny surfaces in the scene), inter-reflections, shading on ^ r aces ^ 
and shadowing all make it difficult to recover the actual color of objects n the 
scene from the image. Existing approaches have mainly focused on the P^^ 
color constancy, where the goal was to extract surface color, i.e , sur f ace refleCt 
properties of objects, in order to obtain a stable perception of the color of an obje 
Lte varying illumination condition,. As this problem is ra der-co nS tra™ed moat 
methods make some assumptions about either the surface bemg imaged [23] or 
about the illumination conditions [25, 14, 11, 32[, or both [10]. Other 
also exist that try to recover image color, i.e„ the color of the objects as they 
appear under the present illumination conditions, accounting separately for m - 
facts such as specularities on shiny surfaces [22]. These methods, however, cannot 
ensure that the color extracted matches the perceived color of , ex 

For the purposes of selection, what kind of color information should be ex 
tracted from regions? Is recovering image color sufficient oii should one attemp 
to recover snrface color? We propose that for both data and model-driven sde - 
tion, it is sufficient if a region could be specified by its perceived color, and the 
effects of artifacts snch as specularities could be separately accounted as was d 
by image color recovery methods. Using the perceptual color two adjacent color 
regions would be distinguished if their perceived colors were different, and this is 
sufficient for data-driven selection. Because objects tend to obey color constancy 
under most changes in illumination, their perceived color remains more or less the 
same thus making it sufficient also for model-driven selection. But can P«“P t “ 
color be quantified at all? In general, several effects such as simultaneous co o 
contrast and color filling, have been known to influence human perception of co o 
[34]. Fortunately, (as we will explain later,) these factors are not very cri ic 

selection. 

2.3 Perceptual Color Specification by Categories 

In this section we present a method for specifying the perceptual color of image 
regions from the colors of their constituent pixels. The color of pixels m images 
is described by a triplet <R,G,B> (called specific color henceforth), wpreson 
the components of image intensity at that point along three waveleng hs (u ually 
red, green and blue as dominant wavelengths to correspond to the filters used in 
the color cameras). When aU possible triples are mapped into a 3-dimensi 
color space with axes standing for the pure red, green and blue respectively, we 
obtain a color space that represents the entire spectrum of computer recordable 
colors. Such a color space must, therefore, be partitionable into subspaces whe 
the color remains perceptually the same, and is distinctly different from that 
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neighboring Spaces. Such subspaces can be called percept no! color coiego™*. 
Not each pixel in the image maps to a point in this color space and hence unU 
into one of these categories. The perceptual color of th,s pueel can, ^rtfon, be 
specified by this color category. To obtain the perceived cobra of reborn from th 
perceptual color of their constituent pixels, we observe the following. AlUiongh 
the individual pixels of an image color region may show considerable yanabon in 
.hdi specific clrs, the overall color of the region is fairly h 

color of the majority of pixels (called dominant color henceforth). Therefore, the 
perceived color of a region can be specified by the color cotegory correspondmg to 

the dominant color in the region. - \ • 

The category-based specification of perceptual color (of pixels or regions) is a 

good compromise between choosing the specific color (which is extremely^unstable 
with respect to changes in illumination conditions, etc.) and sur ace ( 

recovery is hard). Since the categories indicate the perceptual color, they have 
the same beneficial effect as recovering perceptual color, on both data and mo 
driven selection, such as giving a reliable segmentation of image into 
and being stable under changes in illumination conditions. In addition, since th 
perceptual categories depend on the color space and are independent. of the - 
age, they can be found in advance and stored in, say, a look-up table. Finally, 
a citegory-based description is in keeping with the idea of percept™! categonra- 
tion that has been explored extensively through psychophysical studies [4, 5, 2 ]. 
These studies concluded that although humans can discriminate between several 
thousand nuances of colors, psychophysically, we seem to partition the color space 
into relatively few distinct qualitative color sensations or categories 

2.4 Categorization of Color Space 

The above discussion argued for the viability of an approach that recovers 
a color to within a category. Before this can be turned into a 
method of color recovery one needs to address the issue of how such categories 
may be found. Previous work on color categorization involved experiments 
naming the color using a limited vocabulary, or identifying colors using the Munsel 
color charts [34]. But for computational color recovery, we need a way to conver 
the camera recordable red, green and blue components of colors into compu e 
recordable perceptual color categories. This was done by performing some rather 
informal but extensive psychophysical experiments that systematica y exa 
a color space and recorded the places where qualitative color changes occur, thus 
determining the number of distinct color categories that can be perceived, 
this, the hue-saturation-value color space was used as it specifies a g iv en c° or 
in terms of its hue, purity and brilliance - attributes that have been found to 
give a perceptual description of color [20]. The details of these experiments are 
described in Appendix A and will not be elaborated here, excep ^ 

following. The entire spectrum of computer recordable colors ( ) 
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quantized into 7200 bins corresponding to a 5 degree resolution in h^ ijdlO 
levels of quantization of saturation and intensity values (see Figure 7). The color 
each such bin was then observed by displaying a mondnan a uniform color patch) 
of that color on a monitor screen and observing it under dark room 
appropriate monitor calibration. From our studies, we found about 220 differen 
color categories were sufficient to describe the color space. The coior category 
information was then summarized in a color-look-up table. Although it s true 
that a finer level of quantization would have yielded more categories, a smaller se 
is actually more useful since it gives a reasonably coarse description of the color 
of a region thus allowing it to remain the same for some variations in imaging 
conditions. In fact, by the above method we can also determine which categone 
can be grouped to give an even rougher description of a particular hue. This wa 
done and stored in a category-look-up table to be indexed using the color categories 

given by the color-look-up table. 

3. COLOR REGION SEGMENTATION 

The previous section described how to specify the color of regions, after they 
have been isolated. But the more crucial problem is to identify these regions 
In this section, we show that the perceptual categorization principle can be used 
to determine which pixels can be grouped to form regions in an image. If each 
surface in the scene were a mondrian, then all its pixels would belong to a single 
color category, so that by grouping spatially close pixels belonging to a category, 
the desired segmentation of the image can be obtained. But real surfaces being 
hardly mondrians, it is rare that pixels of a region from such surfaces all belong 
to the same color category. They could show considerable variation in color wit 
bright and dark pixels intermixed, and with possibly spurious pixels also being 
present. We now analyse some of the color variations across an image that can 
result from imaging a colored surface in the scene. 

3.1 Variation of Color Across an Image of a 3D-Surface 

In this section we use some assumptions to show that the color variations across 
an image of a surface is mostly in intensity. When a surface is imaged, the light 
falling on the image plane (image irradiance) is related to the physical properties 
of the scene being imaged via the image irradiance equation. 


7(A,r) = p(A,r)F(k,n,s)£(A,r). 


( 1 ) 


where A is the wavelength, r is the spatial coordinate and r is its projection in the 
image, E( A,r) is the intensity of the ambient illumination, p( A, r) is the componen 
of surface reflectance that depends only on the material properties of the surface 
(and hence specifies its surface color), while F(k,n,s) is the component of surface 
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reflectivity that depends on surface geometry, with k,s,n being the viewer direc¬ 
tion, the source direction and the surface normal respectively. Although the image 
irradiance equation assumes that all surfaces in a scene reflect light governed by 
a single reflectivity function, we can easily reinterpret this equation to represen 
image irradiance of a single surface. Under the assumption of a single light source 
the surface illumination E(A, r) can be separated as a product of two terms M ) 
and E 2 (r), and since F( k,n,s) is a function of position r it can be expressed as 
;F(r). Then the image irradiance equation can be re-written as 

I(A,r) = p( A, r)^(r)E!(A)E 2 (r). ( 2 ) 

The surface reflectance and hence the resulting appearance of a surface is de¬ 
termined by the composition as well as the concentration of the pigments of the 
material constituting the surface. For most surfaces, the composition of the pig¬ 
ments can be considered independent of their concentration so that the spectral 
reflectance p{ A, r) can be written as a product of two terms MA) and Mr), ^te 
that this assumption is less restricting than the assumption of homogeneity that 
has been used before [14]. With this simplification, (and grouping the product ot 
terms dependent on A and r separately) the image irradiance equation becomes 

/(A,r) = H(r)L(X). ( 3 ) 

Now, if we consider the filtered version of this signal, i.e., the image irradiance in 
three channels, say the red, green and blue channels with their associated transfer 
functions h R ( A), h G ( A), h B { A), the specific color at each pixel location r is specified 

by the triple <R(r),G(r),B(r)> where 

R(r) = f 0 ~ I(\,r)h R (\)d\ = H(r) L(\)h R (\)d\ = H( r)i?i (4) 

G(r) = J? I{\,r)h a {X)d\ = H(t)JS°L(X)hG{X)d\ = H( r)Gi (5) 
B (r) = S? I(X,r)h B {\)d\ = H(t) f 0 °° L(\)h B (\)dX = H{t)B x . (6) 

This shows that under the given assumptions (which include non-homogeneous 
surfaces,) the color of a surface can vary only in intensity. In practice, even when 
the separability assumption on reflectance is not satisfied, or there is more than 
one light source in the scene, the general observation is that the intensity and 
purity of colors are affected, but the hue still remains fairly constant. In terms 
of categories, this means that different pixels in a surface belong to compatible 
categories , i.e. have the same overall hue but vary in intensity and saturation. 
Conversely, if we group pixels belonging to a single category, then each physical 
surface is spanned by multiple overlapping regions belonging to such compatible 
color categories. These were the categories that were grouped m the category-look¬ 
up-table mentioned in Section 2.4. The next section describes how these concepts 
can be put together to give a color image segmentation algorithm. 
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3.2 Color Region Segmentation Algorithm 

The algorithm for color image segmentation performs the following steps. (1) 
First, it maps all pixels to their categories in color space. (2) It then groups pixels 
belonging to the same category, (3) and finally merges overlapping regions in the 
image that are of compatible color categories. 

1. Maonine pixels to categories: This is done by a simple indexing of the color- 
look-up-table by the color of the pixel specified in terms of its hue, saturation, and 
brightness components. These components can be derived from the specific color 
as described in [9]. This step takes time = O(N) where N is the size of the image. 

2. Grouping p ixels of same category : The image is divided into small non-overlapping 
bins of fixed size (, say, 8x8) and the color categories found in the bins are recorded. 
The size of the bin can be chosen based on expectations about the average size 
of color regions found in natural scenes. Each bin thus has a list of color cate¬ 
gories summarizing the pixel color information in the bin. Neighboring bins that 
contain a common color category can be grouped to give a connected component 
representing an image region of that color category. Since a bin has several co or 
categories, it belongs to several connected components that overlap. The actual 
grouping algorithm we used is a sequential non-recursive labeling algorithm that 
simultaneously assembles all the overlapping connected components using the cate¬ 
gory description in the bins. This algorithm is an extended version of the labeling 
algorithm for binary images described earlier [13], and uses the union-fin a a 
structure to efficiently merge category labels into connected components taking 
time = 0(k 2 M) where M = number of windows, and k = maximum number of 
categories present in the window (= 0(1) for small window-sizes, eg., 8 x 8). The 
resulting labels are propagated back to the pixels to give the precise boundaries 
of color regions of single color categories. The color of the region is then speci e 
by the color category and specific color that is the dominant color in the region as 

described in Section 2.3. . . 

3. Merging overlapping regions: The general problem of determining which regions 

overlap in the image can be a“computationally intensive operation as it involves 
determining which polygonal regions intersect and finding their regions of intersec¬ 
tion. But by using the bin-wise representation of connected components, we can 
detect and combine overlapping regions with greater ease. From the discussion 
in Section 3.1, a shaded region maps to categories in color space that are com¬ 
patible, i.e., have the same overall hue. The categories that are compatib e are 
available from the category look-up-table described in Section 2.4. To find all such 
regions that have compatible categories and overlap in image space, the algorithm, 
examines each window of the image to see if it contains the interior portions of 
regions of compatible color categories. Such overlap regions are grouped as in step 
2. This step again takes 0(k 2 M) time. Finally, the window-level color labels are 
propagated back to the corresponding pixels to give an accurate localization of the 
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color region boundaries. 

The algorithm for color image segmentation thus makes only a constant number 
of passes through the image, each being linear in the size of the image. 

3.3 Handling Specularities 

The above algorithm segments the image into regions according to their per¬ 
ceived color. As we described before, this is sufficient for data-driven selection. 
But for model-driven selection such a description needs to be augmented with the 
knowledge of artifacts that occur in the image such as specularities, shadows, or 
inter-reflections. Such artifacts can cause a model region to appear fragmente . 
For example, a sharp streak of specularity on the surface can cleave its image into 
two regions. If these artifacts could be identified and corrected, this can improve 
the effectiveness of a color-based model-driven selection system. We now discuss 
how one of these artifacts, namely, specularities, can be handled once the color 
regions have been isolated. Specularities are present in regions produced by ob¬ 
jects in the scene having shiny surfaces, such as metallic objects and dielectrics. 
These specularities have a central bright portion that appears white in most il¬ 
lumination conditions (bright sunlight, day light, tube light) and tapers off near 
the specularity boundary merging into the rest of the body color. Such specular 
regions and their adjacent colored regions when projected into a color space form 
characteristic clusters such as the skewed T described in [21]. These clusters can, 
therefore, be analysed to detect and remove highlights using the method described 

in that paper. 

3.4 Results . 

Figures 4-6 demonstrate the color region segmentation algorithm, figure 4a 
shows a 256 x 256 pixel size image of a color pattern on a plastic bag. The folding 
on the bag and its plastic material together give a glossy appearance in the image 
as can be seen in the big S and Y. The result of step-2 of the algorithm is shown 
in Figure 4b, and there it can be seen that the glossy portions on the big blue Y 
and the red S cause overlapping color regions. These are merged in step 3 and the 
result is shown in Figure 4c. As can be seen in the figure, the algorithm achieves 
a fairly good segmentation of the scene for such surfaces. Figure 5 shows another 
image consisting of colored pieces of cloth with the textured region having several 
small colored regions within it. The results of the algorithm (Figure 5c) show that 
even such colored regions can be reliably isolated. Another example (Figure 1) of 
color region extraction was mentioned earlier in Section 1. Notice in the segmented 
image of Figure lb that adjacent objects of the same perceptual color are merged 
(grey books). This is to be expected because the grouping of regions is based on 
color information alone. 
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4. COLOR-BASED DATA-DRIVEN SELECTION 

The segmentation algorithm described above gives a large number of color 
regions. Some of these may span more than one object, while some come from the 
scene clutter rather than objects of interest in the scene. It would be useful for the 
purposes of recognition to order and consider only some of these regions so that by 
isolating data subsets from such regions, the search can be focused on key groups 
of features thus excluding much of the scene clutter. Based on the rationale given 
in Section 1, we propose that the color regions be ordered by their saliency, i.e., 
by how distinctive they appear. The method of color-based selection, therefore, is 
to extract color regions from the image, order them based on a measure of color- 
saliency and then select a few most salient regions to be given to any recognition 
system. In this section we first describe a measure of expressing color saliency, and 
then examine the utility of salient-region selection in recognition. 


4.1 Finding Salient Color Regions in Images 

In trying to express distinctiveness, one encounters the question: Is distinctive¬ 
ness expressible at aU? In general, any judgement of distinctiveness has both a 
sensory and a subjective component. Thus for example, while most of us can per¬ 
ceive brighter colors more easily than duller colors, the judgement of which of two 
hues of the same brightness and saturation are more salient can be subjective. The 
aim here is to focus on the sensory component of distinctiveness and hence extract 
properties of regions that are general enough to be perceived by most observers. 
Accordingly, we propose that the saliency of a color region be composed of two 
components, namely, self-saliency and relative saliency. Self-saliency determines 
how conspicuous a region is on its own and measures some intrinsic properties 
of the region, while relative saliency measures how distinctive the region appears 
when there are regions of competing distinctiveness in the neighborhood. 

In order to develop such a measure for color-region saliency one has to ask 
the following questions: What features in regions determine their saliency? How 
can they be measured to reflect our sensory judgments? Finally, how can they 
be combined to give the saliency measure? We now address these questions and 

derive a measure of color-saliency. 


4.1.1 Features used for measuring self and relative saliency 

Si nC e the saliency of a color region depends on the region features used, they 
must be carefully selected. Such features should be: (i) perceptually important, 
(ii) easily measurable, and (iii) fairly general, to avoid subjective bias. ^ 

1. Color: The color of a region is an intrinsic property and affects a region s self- 
saliency. It is specified by (s(R),v(R)), where s(R) = saturation or purity of the 
color of region R, and v(R) = brightness, and 0 < s(R),v(R) < 1.0. The hue of 
colors is not considered, to avoid subjective bias. 
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2. Region size: The size of a region is again an intrinsic property and affects its 
self-saliency. It is chosen as a feature based on the observation that regions tha 
are either very small in extent, or that are large enough to cover the entire field of 
view, do not often attract our attention. Also, very large regions can potentially 
span more than one object, making them unsuitable for selection. The size feature 
is expressed by the normalized size r(R) = Size(R)/Image-size. 

3 Color contrast: The color contrast a region shows with its neighbors affects 
its relative-saliency. The rationale behind choosing color contrast is that even if a 
region has an interesting intrinsic color, it may not be distinctive if all its neighbors 
also have equally interesting colors, unless it shows the greatest contrast. It is 
difficult to express color contrast in a numerical measure that can account for 
the variations in an observer’s judgement with the conditions of observation, size, 
shape, and absolute color of the stimuli [34]. In the color contrast measure we 
chose, we augmented an empirical color difference formula to predict the observed 
color differences, with the knowledge of the hues of the colors derived from their 
categorical representation. Specifically, the following difference formula d(Cjj,Or) 
was used to measure color difference between two color region R and T with specific 

colors as C R = (ro,goi bo) T an( l as: 


d(C R ,C T ) 


r o _ _r_\2. (_- 9 — -) 2 (7) 

r Q + go + b 0 r+g + b r 0 + g 0 + b 0 r + g + b 


As this measure does not explicitly take into account the hues of the colors, the 
color category-based representation is used to ascertain whether the hues of the two 
regions are different, and then the extent of difference is judged using d(C R , Ct) in 
such a way that the contrast between regions of different hue is emphasized. This 
allows the measure to handle simultaneous color contrast to some extent. The 
measure is given by c(R,T) below: 


( d r v\ _/ C t) if R and T are of same hue 

c(R, T) - < + k x d(C R , C T ) otherwise 


( 8 ) 


where ki = and k 2 = 0.5, so that 0 < C (R>T) < 1.0. 

d Si 7 ,e contrast: The size contrast is a feature for determining relative saliency and 
is chosen because it determines if a region is mostly in the background or in the 
foreground. The size contrast of a region R with respect to an adjacent region T 
is simply the relative size (area) and is given by 


t(R,T) 


( size(R) size(T) 
ysize(T) ’ size(R) 


(9) 


Since a region R has several neighboring regions in general, the color contrast 
c(R) and size contrast t(R) of a region R are measured relative to a best neighbor 
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Tbett for each region, so that c(R) = c(R,Tbe,t)> and t(R) — t(R,Tbe tt )- T bett is the 
neighboring region that is ranked the highest when all neighbors are sorted first 
by size, then by extent of surround, and finally by contrast (size or color contrast 

as the case may be). 


4.1.2 Comb ining features for self-saliency: To determine self-saliency from 
the chosen features, they are weighted appropriately to reflect their importance. 
The self-saliency measure chosen emphasizes purer and brighter colors over darker 
and duller colors by choosing the weighting functions for saturation and brightness 
as /l ( s (R)) = 0.5s(R), and / 2 (v(R)) = 0.5v(R) respectively. The size of a region is 
given a non-linear weight to deemphasize both very small and very large regions as 
they do not often attract our attention. The corresponding weighting function has 
sharp as well as smoothly rising and falling phases determined by the breakpoints 
tx,t 2 ,t 3 ,t 4 as shown in Figure 8a and the equation below. 1 Here n stands for the 

region size r(R). 


h{ n ) = 


ln(l—n) 

1 - e~ C2n 


0 < n < fi 
h < n < t 2 
S 2 — C3Zn(l — 71 + t 2 ) t 2 <. Tl <t 3 


( 10 ) 


s 3 e 

0 


-c 4 (n-tj) 


t 3 <n <t 4 

t 4 < 71 < 1.0 

where t x - 0.1, t 2 - 0.4, t 3 = 0.5, t 4 = 0.75, s x = 0.8, s 2 = 1.0 ,^3 = 0.7, s 4 = 10 

, i n(l-ti) „ _ Ml-»i) r - _ («»-«»)_ c , 

and ci =-^— l ,c 2 —-, c 3 — Jn (i +t2 _tj)’ L4 

region R = r(R). 


-3 


and n - size of 


(u-t s ) 


4.1.3 Combining features for relative saliency: 

Once again, the chosen features are weighted appropriately to determine rela¬ 
tive saliency. The color contrast is weighted linearly by a function / 4 (c(R)) = c(R), 
to emphasize regions showing greater contrast. The relative size is exponentia y 
weighted by a function f s (t(R)) = 1 - e~ 12 ^ to favor situations in which a region 
and its best neighbor have approximately the same size. 


4.1.4 Finding self and relative saliency 

Once the various features determining self and relative saliency are appropri¬ 
ately weighted, they reinforce each other so that the self and relative saliencies 
can be given by simple additive combinations of their individual features. The 
self-saliency of a region R denoted by SS(R) is given as /i(s(R)) + / 2 MR)) + 

1 Such a function along with the thresholds and rates of change was empirically derived from 
informal psychophysical experiments (whose details will not be elaborated here) performed using 

color regions of various sizes. . , , 

2 Once again this function was obtained by performing informal psychophysical experiments. 
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/,( r (R)). Similarly, the relative saliency of the region R, RS(R) is given by / 4 (c(R)) 
+ f K (t(R)). Finally, the overall saliency of a region R is expressed by a linear combi¬ 
nation of self and relative saliency as SS(R) + RS(R), using the following rationale. 
Any combination method should be flexible enough to allow a region to be declared 
salient if it shows good contrast (i.e., high relative saliency) even though it may 
not be interesting on its own. Conversely, a region that is interesting on 1 s own 
but fails to become interesting in the presence of neighboring regions should no 
be chosen. On the basis of these observations alone, nonlinear combining methods 
such as (SS( R) * RS( R)) or max(SS(R), i*S(R)) are not suitable. If a region is 
both interesting on its own as well as in the presence of other regions in the scene, 
then it must be given more importance. All three criteria are satisfied when the 
two saliency components are linearly combined. The color saliency of a region R 

is therefore given by 

Color-saliency(R) = fi(s(R)) + / 2 (^(i?)) + h(r(R)) + fM&)) + fM*))- ( n ) 


The saliency measure described above does not completely take into accoun 
all the perceptual effects of simultaneous color contrast, color-fining, etc. Because 
such effects do not greatly undermine a region that is already very outstanding 
(very salient), and because saliency is being used to rank the regions, we have 

ignored these effects. 

The color regions in the image can now be ordered using the saliency measure 
and a few most significant regions can be retained for selection (called salient re¬ 
gions, henceforth). The number of salient regions to be retained can be determined 
when the selection mechanism is integrated with a recognition system to perform 
a specific task, and is therefore left unspecified here. 


4.1.5 Results 

We now illustrate the ranking of regions produced by the color saliency measure 
derived above. Figures lc-lf show the four most distinctive regions found by 
applying the color-saliency measure to all the color regions extracted from the 
scene shown in Figure la. Figures 4d-4f, 5d-5f, 6c-6f, show the few most salient 
regions found in their respective scenes. In the experiments done so far, the color- 
saliency measure was found to select fairly large bright-colored regions that showed 
good contrast with their neighbors, and appeared perceptually significant. 

4 2 Use of Salient Color-based Selection in Recognition 

Data-driven selection based on salient color regions is primarily useful when 
the object of interest has at least one of its regions appearing salient m the given 
scene. In such cases, the search for data features that match model features can 
be restricted to the salient regions, thus avoiding needless search m other areas of 
the image. By selecting salient color regions, we obtain a small number of g ro ^ps 
(a region is itself a group), containing several features. It was shown in [7j tha 
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such large-sized groups are useful for indexing, i.e., to determine which regions 
from models in a library could correspond to a given group. But when the task 
is to recognize a single object, it is desirable to have small-sized groups. For this, 
existing grouping techniques can be applied to the data features found within the 

color regions to obtain reliable small-sized groups. 

We now estimate the search reduction that can be achieved with such a selection 
mechanism. Let (M,N) = total number of features (such as edges, lines, etc.) in 
the model and image respectively. Let {Mr, Nr) = total number of color regions 
in the model and image respectively. Let N s = number of salient regions that are 
retained in an image. Let g = average size of a group of data features, within a 
model or image. Let {G M ,G N ) = number of groups formed (using any existing 
grouping scheme) in the model and image respectively. Finally, let G Ni be the 
number of groups in the salient image region i. Using the alignment method of 
recognition [16], at least three corresponding data features are needed to so ve 
for the pose (appearance) of the model of a rigid object in the image If no 
selection of the data features is done, then the brute-force search required to try 
all possible triples is 0 (M 3 iV 3 ). If selection is done by only grouping methods 
(i.e., without color region selection), then the number of matches that need to be 
tried is O(GmGWV) since only triples within groups need to be tried. But as we 
mentioned before, grouping methods often make mistakes, so that not all groups 
contain features belonging to a single object. In at least one such study [6] out of 
the 150 or so groups isolated, about 83 groups actually came from single objects. 
Most of the remaining 67 groups would not yield any consistent match and would 
represent fruitless search. Consider the case when grouping of data features is done 
within all the color regions. With this, the grouping is more reliable, and also, the 
number of groups is smaller (as groups straddling regions are not con^dered), 
so that the overall effect is to reduce the search. For example, with M - 2UU, 
N = 3000, g= 7, and G M = 30, G N = 430 (these numbers are typical of indoor 
scenes), the search reduction assuming 70% reliability in simple grouping to > 95% 
reliability in grouping within color regions is « 0.25 * 10 9 which is a considera e 
improvement. Consider next, when grouping is restricted to salient color regions. 
The number of matches further reduces to 0(£& G Nj G M 9 Z 9% since only the 


groups in the salient regions need be tried. 

To obtain an estimate of the number of matches and time taken for matching in 
real scenes when color-based selection is used, we recorded the number of regions 
(obtained by applying the segmentation algorithm of Section 3), and the number 
of data features within regions in some selected models and scenes (Figures 1 and 
2 show typical examples of models and scenes tried). The regions were ordered 
using the color saliency measure and the four most salient regions were retained. 
Then search estimates were obtained using the above formulas, and assuming a 
grouping scheme that gives a number of groups within regions that is bounded 
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the number of features in a region _ This jg a good bound on the number of 

parallel lines in the region. The rcnlt of such stud.es ,s shown m Table L A^can 
be seen from this table, the number of matches is always smaller when salient 
colorTegTons are used for selection. But the ultimate utility of such a selec ion 
mechanism can be accurately gauged only after it is integrated with a recogmt.o 
system. Current research is being directed towards this effort. 

5. COLOR-BASED MODEL-DRIVEN SELECTION 

The previous section described a data-driven selection mechanism that was 
meant for an object of interest having some salient color regmns J^ut^Hen 
be of much help when the object of interest is not sahen in color (brt 
in some other domain, say texture) or is not salient at all. In such cases,, th 
color description of the model can be used to perform selection. We 
one such color-based model-driven selection mechanism. Here, given a 
description of a model object, the task is to locate color regions that■ 
description. The use of model information to constrain the match g 
features to image features is not new. Several model-driven search 
techniques such as generalized Hough transforms [17], heuristic termination [12] 
and focal features have evolved [2, 1, 3]. The emphasis in these methods was on 
geometric constraints that can prune the search space during the matching stage 
of recognition. The approach we present here, on the other hand, emphasizes 
some global relational information about model color regions to prune the sear 
space prior to matching. It also provides possible correspondences between-ode 
and image regions. Such a correspondence can further reduce the complexity 
recognition bfcause the search for pairing model features to data features; can be 
restricted now to these corresponding regions rather than all image regio . 
information in the model object has been used before to search for^stances^o 
the object in the given image of a scene [31, 33]. These approaches represent 
model and image color information by color histograms and perform a mat'd, 
the histograms. Such approaches usually cause a lot of false posrt.ve .dent.hcat.ons 
and do not explicitly address some of the problems that anse m go.ng from a modd 
object to its instance in a scene. Also, since they do not supply COII “P on ^ 
between model and image regions, they are not as useful for reducmg the search 

" hit! 0 for any scheme for modd-driven sdection to be effective for reducing 
the search in recognition, it must meet two requirements: (.) .t m ” st b * s "®“ e * 

selective to avoid many false positive identihcatwns that cause needless 
matches and (ii) it must be sufficiently conservative to avoid many false negative , 
«msfo^ recognition to fail when it should have succeeded. A .electron scheme 
can make false negatives if it does not adequately take into account the van 
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problems that arise in going from a model objec to its .mage m the scene. An 
object may not appear the same in the scene as ,t does m the model, because it ha 
undergone pose changes, or because it is occluded, or its colors appear thife ent 
in the current illumination conditions. In addition, artifacts such as speculant « 
inter-reflections, and shadows may also cause changes in the appearance of th 
object. So how can a model-driven selection mechanism meet these two 
conflicting requirements? We now describe an approach to model-dnven select! 
that meets some of these requirements. It makes a particular cho.ce of modd 
description and assumes that this is made available to it for selection. Since this 
model description affects the way our approach formulates the color-based model- 
driven selection problem, it is described first. 

5.1 Model Description . 

The color region information in the model 3 (in an image or view of the model, 

that is) is represented as a region adjacency graph (RAG) 


Mg —< lm) E m ,Cmi Rm, S m , B r mi B, 


> 


( 12 ) 


where V m = color regions in the model, E m = adjacencies between color regions 
c" (u) = color of region u € V m , R.(u,v) = relative size of region V with respec 
to region u. S m (u) = size of region u, and IU = a bound on the 
regions given by Rm, and B, m = a bound on the absolute size of regions given by 

^ The above description exploits features of regions that tend to remain more or 
less invariant in most scenes where the model appears. If the color of a model region 
is specified by its color category, then as we discussed before it tends to remain 
relatively stable (or changes in a predictable way) under variations m illumination 
conditions, and pose changes. Similarly, the adjacency information between two 
color regions tends to remain more or less invariant in the different appearances 
the object, as long as the two regions are visible in the given image and there are no 
occlusions. Finally, the relative size of regions is preserved under changes of scale^ 
But it can undergo considerable changes if the pose of the o jec c ange , 
when a region goes partially out of view. The bound on the relative size changes 
in each pi of adjacent region, B rm indicates the extent of pose changesthat * 
selection mechanism is expected to tolerate. Relative size changes can also occur 
due to occlusions. By placing some loose bounds on the absolute s ^ ze c J ian ^ “ 
given by B, m , the model description restricts the changes that can be toierated^ 
the presence of occlusions. For size changes in a region that go beyond the bounds, 

3 The model description specifies a color view, that is, a range of 2D views of t h e moddm 
which one or more of the color regions described in the model are visibly If the “° del haS ^ e 
views showing an entirely different set of color regions, then they must be specified p 

color views. 
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that region will be considered no longer recognizable, and then the selection will 
have to depend on the evidence for other model regions in the image. 

Tto description is fairly rich and has some structural information about color 
regions that can be used to restrict the number of false positives, and some con¬ 
stants on the relative and absolute size changes that can be used to restnct th 
number of false negatives made by the selection mechanism. . , 

Finally, the model description gives a way to analogously organize the col 
region information in the image as an image region adjacency graph as la 
V,,E,,C t ,RnS, >, where each term has a meaning analogous to < V„E m , O m , ft™, i. 

respectively. 

5.2 Formulation of the Color-based Model-driven Selection 

P Tn b tWs I ^ection we will formulate the color-based model-driven selection prob¬ 
lem as a type of subgraph matching problem. Given the image 
graph, the model object if present in the scene represented in the> image wdl 
form a subgraph in l a . The location strategy can be regarded as the pro'bl 
of searching for suitable subgraphs that satmfy the modd descnp^tion. Any h 
subgraph I. =< V„,E„C„R„S, > such that ||V|| < W* 11 '™ £ 

associated with it a node correspondence vector T - {(u„,u„)|W m 6 » 

V U { hi. (.L i is a null match}. Although there are an exponential number of such 
subgraphs, not all of them correspond to model RAG. From the “^el descnp ion 
a set of unary and binary constraints could be derived (as is described latm) ‘hat 
make only some subgraphs feasible. A feasible subgraph is, therefore, a subgraph 
that has all its nodes satisfying unary and binary constraints. For model-dnven 
selection, since it is desirable to have at most one image subgraph matching t 
model RAG, we can select from among these subgraphs, a subgraph(s) that i 
some sense best satisfies the model description. Here « formulate coto-based 
model-driven selection as the problem of choosing a feasible subgraph(s), I, that 
minimizes the following measure: 

SCORED) = (1 - pyf) + PO ^ 3) 

where «».».) the <*“*' ” * he ie ' atiVe ^ ^^"1 

model regions (u m ,v m ) are paired to corresponding image regions and 

^ / \ _ . SCORE (In) emphasizes 

given by Rm g {u m ,v m , u g , v g ) — ma x(n TO (um,t>m),#,(*„«»)) . , , ,, . 

rewards for making as many correspondences as possible as indicated by the fir t 
term, called Match(/„), and penalties for a mismatch of the relative size, as 
cated by the second term, called Deviation(/„), which measures he mean square 
deviation of the relative sizes. Since the subgraphs are all feasible the de " a 
accounts for occlusions and pose changes in a more refined way than the bina y 
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constraints alone. Another advantage of this measure is that it can be m “ e 
tally computed from individual region matches, so that a branch-mtd-bonnd^h 
formulation can be used to reduce considerably the search involved in finding he 
best subgraph (i.e. the one with the lowest score). Fin y, t e a ove °™>” 
is based on the hypothesis that at least one of the regions in the isolated sub¬ 
graph corresponds to a model region. It is also designed primarily to locate^single 
instances of the model object in the image. More instances can he found after 
removing the regions in the found instance from the image RAG. 

5.3 A Color-based Model-driven Selection Mechanism 

A color-based model-driven selection mechanism was built using the> above 
formulation. The mechanism essentially uses a search strategy o n e 
subgraph. The result of selection is the correspondence vector associated with 
best subgraph. The search strategy used the following constraints to restrict the 

search among feasible subgraphs. . , , • 

1 Unarv constraints: The color and absolute region size information provided 
the model description were used to develop unary constraints on these features^ 
The color CJu,) of an image region u, is said to match the color C„|tg.) 
model region u m if these colors belong to the same category or compatible cate¬ 
gories (described in Section 2.4). With this scheme, brighter colors (of a given hue) 
in the model could potentially match to darker colors of the same ” 

the image, thus accounting for simple lowering in illumination levels. The bounds 
In the absolute sine provided by B. m act as loose size constramts to -le out some 
deally absurd scale changes (such as, say, a 100 fold increase in the smallest model 
region implying a blowup of the model outside the image bounds). 

2 . 6 Binary constraints: The adjacency (as well as non-adjacency) and relative size 
information provid^din the model were used as binary constraints to prune some 
impossible subgraphs. Specifically, the lack of adjacency m model regions is a 
powerful constraint, because two adjacent regions in the image cannot correspon 
to two regions that are not adjacent in the given color description (assuming a ngi 
model) 4 . Two adjacent regions in the model may, however, not appear adjacent m a 
given image due to occlusion. A simple analysis of occlusions could rule out several 
false matches in such cases (such as, say, discarding a match if the area spanne 
by the occlusion within a rectangle enclosing the candidate non-adjacent imag 
regions far exceeds the combined size of the corresponding adjacent model regions). 
The bound on the relative sizes served as another binary constraint. The bound 
Brm was used to constrain possible matches by requiring Rm g {u m ,v m ,u g ,v g ) _ 
B Ttn (Um, 

3. Searching for th e best subgraph . . . , 

The search for the best subgraph (i.e. the subgraph that minimize the value 


4 Notice here that the search is for a given color view of the model. 
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of SCORE), can in principle, be done by an exhaustive enumeration 
But with the algorithm described below, the search required >■ *« d ”“ d * * 
extent. The algorithm used is essentially a variation of the branch and b 
interpretation tree (IT) search (12), with the major difference being that no verifi¬ 
cation is done when the search reaehes a leaf node (as the task is 'dectwn an ^ 
recognition). Each level of the search tree represents a possible match for a mo 
region (this includes a null match), so that the depth of the search J * 
by the number of nodes in the model RAG. The unary constraints are checked « 
priori to prune the breadth of the search tree. A subgraph in the> imageTtAG h 
is a potential match for the model RAG is represented by a path in the IT. 

value of SCORE is updated at each node as SCORE i+ i = SCORE. — ,|vWu|| + IIMI' 
By keeping the lowest value of SCORE so far, search can be cut off below any node 
wfth aDe?iation(/ fl ) value greater than the lowest SCORE v^ue. 
unary and binary constraints prune the search tree cons.deraMy so that the ave 
age number of full paths (up to the leaves) explored are few (~ 50). FinaUy, 
an instance of the model region has been found in the image Ithe selected area 
is removed and the search repeated on the resulting image RAG to look for more 
instances of the model object. 

5 ’ 4 Tta result of using color-based model-driven selection are illustrated in Figures 
2 and 3. Figure 2a shows a model object, and its color description obtained by 
using the color-region segmentation algorithm of Section 3 is shown in Figure 
2b Here the background was removed by a simple threshold on intensities T 
description is used to create a model RAG. Figure 2c shows a scene in which the 
model object occurs. The scene shown has several other objects with one or more 
of the model colors. Also, the model appears in a different pose here, being; rotated 
to the left about the vertical axis. Figure 3b shows the result of applying the unary 
color constraints. The big blue glass matches the small blue flowers based on color 
alone Next, the unary constraint on absurd size changes are used ‘" prune the 
possibilities and the result is shown in Figure 3c. Finally the subgraph, wit. the 
lowest value of SCORE is shown in Figure 3d. As can be seen from this fig , 
a region containing most of the model object has been identified even 
color image segmentation was not perfect (notice the small streak above the white 
rim of the cup that merges with the book in the backgroun ). 

5.5 Search Reduction using Color-based Model-driven Se- 
1 cti ion 

^TWolor-based model-driven selection mechanism provides a correspondence 
of model regions to some image regions. The matching of model featuresoimage 
features can be restricted to within corresponding regions, and this reduces th 
number of matches that need to be tried for recognition. To reduce the search 
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further, conventional grouping can be performed within the selected -lor regions 
as described in Section 4.2. To estimate the search reduction in this case w 
continue with the analysis done in that section. Let N, be the number of solutmn 
subgraphs given by the selection mechanism, and let h represent one such subgraph 
S the number of nodes = N k . Let (G v ,G n ) = the number of groups m regmn 
u, of the solution subgraph h , and region v, of the model RAG that 
u as impBed by the correspondence vector T associated with h. Then assunu g, 
ij before^ the average size of the group = j, the number of matches that need to 
. . , a n /yjv, yN h Q Q 3 3\ To com pare this kind of selection with pure 

tried are U{2^k=i 2~>j=i -y )• r ... „ _ oon N 

grouping we can take some typical values of these numbers. Letting M - 200, 

the numbeTof'matches^with'^roupfng ^on^to'b^'ofGsiGN'P^S 3 ) mat^^s 

using model-driven color-based selection with grouping, the number of matches 
become « 1.25 * 10 8 . Assuming 1 microsecond as time per match this correspon 
to reduction in match time from 26 minutes to « 2 minutes By trying se ^ 
models and images of scenes where they occured we recorded the average 
of subgraphs generated by the model-driven selection mechanism. The search 
estimates were obtained using the above formula for mo del-driven selection with 
grouping, and the formulas for other methods mentioned in Section 4.2. 
results are shown in Table II. The bound on the number of groups in a region was 
the same as used in Section 4.2. As can be seen from the table, the number o 
matches using correspondence between model and image color regions is always 
lower. A curious feature to note from the table is that it takes less number o 
matches (and hence lesser time) for a more complex model (entry 1 m Table U) 
containing several color regions, than for a simple object with fewer regions ( y 
2 in Table II). This is understandable since, with a large number of regions, the 
constraints are stronger and hence the false matches are fewer. . « 

Discussion: The above studies estimated the search reduction without actually 
integrating the selection mechanism with a recognition system. Moreover the 
estimated search was based on the assumption that there were no false n ^ tlV ' S 
given by the selection mechanism. This can happen since a subgraph with the lo - 
est value of SCORE may not always indicate a match to the model. To es ima 
the number of false positives, the number of false negatives, and the reduction m 
search that results due to this color-based selection mechanism we have recently 
developed a 3D from 2D recognition system and are currently testing it. Erehm - 
nary results on using the selection mechanism as a front-end for recognition have 

so far been encouraging. 


6. SUMMARY 


In this paper we have shown how color can be used as a cue to perform both 
data and model-driven selection. Unlike other approaches to color, we have used 
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the intended task to constrain the kind of color information to be extracted from 
images. This led to a fast color image segmentation algorithm based on P«ceptu 
categorization of colors to given perceptually different color regions. Tlu. cote 
description of the image formed the basis of data and model-driven selection. A 
saliency measure was then developed to rank the color regions to perform data- 
driven selection. Lastly, an approach to model-driven selection was presented th 
exploited description of model color regions to locate instances of model in the 
image. Finally, we regard color as one of the many cues that can be used for 
selection. Future research is directed towards using other cues such as texture t 
perform data and model-driven selection. 

appendix a 

In this appendix we describe the psychophysical experiments done to derive the 
color categories. The aim of these experiments was to record the perceptual ju ge- 
ments of colors in different regions of the color space by a systematic explora ion 
of the color space. For this, the hue-saturation-value representation of color space 
was used. As shown in Figure 7, the entire spectrum of computer recordable colors 
(2 24 colors) was quantized into 7200 bins corresponding to a 5 degree resolution in 
hue, and 10 levels of quantization of saturation and intensity values. In order o 
scan the color space systematically, the colors in bins were observed starting with 
the bins of red hue and going around the color space back to the red hue again. The 
display set up involved a 24-bit high resolution monitor with appropriate monitor 
calibration to observe the colors in dark room conditions with a minimum viewing 
distance of 2 feet. Uniform color samples (mondrians) of size 64 x 64, correspond¬ 
ing to the hue-saturation-and brightness value in each bin were displayed on the 
screen. The set of mondrians displayed on the screen varied in purity vertically, 
and intensity horizontally, while the hue was kept constant. For each hue the col¬ 
ors initially displayed had a resolution of 0.2 in brightness and saturation. Four 
subjects were tested individually and were supplied with a chart that showed the 
gradations in brightness and purity varying in a manner that corresponde o e 
color spectrum shown on the display. Each subject was then asked to group the 
color samples displayed on the screen into perceptually uniform color groups an 
mark the result on the chart provided, so that the end result was a segmentation 
of the chart into perceptually uniform colored groups. The presence of a boundary 
was taken to mark a change in color category. To precisely locate this boundary, 
the color samples around the boundary were redisplayed with a finer resolution (o 
0 1) in brightness and saturation. Before assigning a new category label each group 
is compared with groups of previous hue by displaying the colors in the previous 
group along with a given group and asking the subject to judge if this group co 
be merged with the previous hue groups. The observation of successpm mondn- 
ans was done with a 10 minute intervals in between to remove after-effects of the 
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previous display. The mondrians displayed were sufficiently apart on the screen to 
keep the effects of simultaneous contrast small. By averaging out the differences 
in the responses between subjects, we found about 220 different color categories 
were sufficient to describe the color space. The color category information was 
then summarized in a color-look-up table. 
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Figure 1: Illustration of color region segmentation and color-saliency. (a) Input image depicting a 
scene of objects of different materials and having occlusions and inter-reflections, (b) Segmented image 
using the color region segmentation algorithm, (c)-(f) The four most distinctive regions detected 
using the color-saliency measure. The white portion in the red book appears so because of the white 
background. 


(a) (b) 



Figure 2: Illustration of model-driven selection — Model and scene. ( a ) The object sen [ in ^ * S 
model, (b) Its color description produced by the segmentation algorithm of Section 3. (c) A cluttere 

scene in which the object appears. 


Figure 3: Illustration of color-based model-driven selection, (a) A scene contammgthe modelob^a 
of figure 2a. (b) Regions selected based on unary color constraint (c) Regions of (b) P™ edaf 
usingTe unary size constraint, (d) Regions corresponding to the best subgraph that matched the 

model specifications. 


(a) 


(b) 


(c) 




(d) 


(•) 


(f) 


Figure 4: Illustration of color region segmentation and color-saliency. (a) Input image c ° n *J st ^ 
regions of 3 different colors: red, green and blue against an almost white back^oumL b)R«ult of 
step2 of algorithm with regions colored differently from the ongmal image, (c) Fmal ^ 
of the image of Fig.3a. (d) - (f) The three most distinctive regions found using the col 


measure. 



(a) 


(b) 


i c 



Figure 5: Illustration of color region segmentation and color-saliency — Another example, (a) Input 
image of a set of colored cloth materials, (b) Regions obtained at the end of step-2 of algorithm (before, 
merging overlapping regions), (c) Final segmented image suitably recolored to show the segmented 
regions, (d) - (f) The three most distinctive regions found using the color saliency measure. 



Figure 6: Illustration of color region segmentation and color-saliency — Last example, (a) Input 
image depicting a scene of different kinds of objects (cloths and polished book), (b) The color regions 
extracted from (a) using the color region segmentation algorithm, (c)-(f) The four most distinctive 
regions detected using the color-saliency measure. 
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Figure 7: Illustration of the quantization of the hsv-color-space. (a) hsv-color model, (b) a cell of 
the quantized color space, (c) The quantization data and the number of categories obtained. 
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Figure 8: Graphs of weighting functions used in devising the color-saliency measure. 
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